Hierarchical Divisive Clustering with Multi View-Point Based Similarity Measure

نویسندگان

  • S. Jayaprada
  • Amarapini Aswani
  • G. Gayathri
چکیده

All clustering methods have to assume some cluster relationship among the data objects that they are applied on. Similarity between a pair of objects can be defined either explicitly or implicitly. In this paper, we introduce a novel multi-viewpoint based similarity measure and two related clustering methods. The major difference between a traditional dissimilarity/similarity measure and ours is that the former uses only a single viewpoint, which is the origin, while the latter utilizes many different viewpoints, which are objects, assumed to not be in the same cluster with the two objects being measured. Using multiple viewpoints, more informative assessment of similarity could be achieved. Theoretical analysis and empirical study are conducted to support this claim. Two criterion functions for document clustering are proposed based on this new measure. We compare them with several wellknown clustering algorithms that use other popular similarity measures on various document collections to verify the advantages of our proposal. Keywords— Data Mining, Clustering, Similarity Measure, Histograms, Parser.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis

Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This pap...

متن کامل

Hierarchical Time-Series Clustering for Data Streams⋆

This paper presents a time-series whole clustering system that incrementally constructs a hierarchy of clusters. The Online DivisiveAgglomerative Clustering (ODAC) system is an incremental implementation of divisive analysis clustering, using the correlation between timeseries as similarity measure. The system tests existing clusters by descending order of diameters, looking for a possible bina...

متن کامل

Algorithm for Hierarchical Multi-way Divisive Clustering of Document Collections

This paper proposes a novel algorithm of hierarchical divisive clustering, which generates a multi-branch tree, not a binary one, as its output. In order to use the algorithm for clustering large document sets, a spherical kmeans clustering algorithm based on a cosine measure is adopted for partitioning recursively the document set from the top to bottom. Also, by selecting automatically the nu...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

Merging Similarity and Trust Based Social Networks to Enhance the Accuracy of Trust-Aware Recommender Systems

In recent years, collaborative filtering (CF) methods are important and widely accepted techniques are available for recommender systems. One of these techniques is user based that produces useful recommendations based on the similarity by the ratings of likeminded users. However, these systems suffer from several inherent shortcomings such as data sparsity and cold start problems. With the dev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013